Water-borne diseases including cholera, typhoid, diarrhea, and hepatitis A remain a severe public health burden in rural Northeast India, affecting over 37 million people annually. Existing surveillance systems suffer from irregular manual testing, paper-based reporting, and near-total failure during monsoon floods. This paper presents AquaSentials, a comprehensive smart health surveillance and early-warning platform that fuses low-cost solar-powered IoT water-quality sensors, an offline-first multilingual mobile application, community-driven health reporting by ASHA workers, and an ensemble AI/ML outbreak-prediction engine. The system operates over a hybrid 2G-GSM/LoRaWAN/Wi-Fi connectivity stack with SD card local buffering. The ML pipeline—comprising a Random Forest Classifier, Isolation Forest anomaly detector, and Facebook Prophet forecaster—was trained and evaluated on a 1,820-sample hybrid dataset constructed from WHO/NFHS-5 statistical distributions. The Random Forest classifier achieves 0.659 accuracy, 0.727 F1-score, and 0.705 ROC-AUC, outperforming decision tree baselines (AUC 0.60) on the binary outbreak classification task. The IsolationForest flags 10.4% of readings as anomalous for ASHA field verification. Simulation results indicate the potential for meaningfully faster outbreak detection compared with traditional paper-based surveillance, subject to field validation.
Introduction
The Northeastern Region (NER) of India faces high vulnerability to water-borne diseases due to contaminated surface water, poor sanitation, and weak real-time health surveillance, with traditional systems relying on slow manual reporting. Floods exacerbate these risks by disrupting transport, power, and network infrastructure.
AquaSentials is an integrated IoT-AI platform designed for rural NER to address these challenges. It combines: (1) solar-powered multi-sensor IoT nodes for real-time water-quality monitoring (pH, TDS, turbidity, biosensors, temperature/humidity), (2) offline-first, multilingual mobile applications for community health reporting, (3) an AI/ML engine for outbreak prediction using Random Forest, Isolation Forest, and Prophet models, and (4) role-differentiated dashboards and automated alerts.
The system uses hybrid connectivity (Wi-Fi → LoRaWAN → 2G GSM) with local SD-card buffering to ensure data transmission during connectivity outages. A cloud backend with MongoDB, InfluxDB, PostgreSQL, Redis, and ElasticSearch supports storage and real-time processing. Villagers can report symptoms anonymously, view water quality status, and interact with a multilingual AI chatbot, while administrators receive predictive alerts and visualized data for decision-making, enabling timely response to prevent outbreaks.
Conclusion
This paper has presented AquaSentials, a smart community health surveillance and early-warning platform addressing five core failure modes in NER disease surveillance. The system integrates solar-powered IoT water-quality sensing, an offline-first multilingual mobile application, an ensemble AI/ML prediction engine, and role-differentiated dashboards with automated alerting.
Experimental evaluation on a 1,820-sample hybrid dataset demonstrates that the Random Forest classifier achieves 0.705 ROC-AUC and 0.727 F1-score, outperforming the Decision Tree baseline (AUC 0.597). The IsolationForest flags 10.4% of anomalous readings for ASHA field verification. System-level simulation suggests the potential for meaningfully faster outbreak detection and near-zero data loss compared with traditional surveillance. These results establish a quantitative baseline for comparison with future models trained on real deployment data.
Future work will focus on: (i) field validation on real sensor and case-report data from pilot villages in Assam and Meghalaya; (ii) extension to arsenic and fluoride contamination detection; (iii) federated learning across district nodes; and (iv) integration with the National Disease Surveillance Portal (NDSP) for automated national reporting.
To the best of our knowledge, this is among the first integrated IoT–AI frameworks tailored specifically for water-borne disease surveillance in the Northeast India context.
References
[1] World Health Organization, \"Diarrhoeal disease fact sheet,\" WHO, Geneva, 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease
[2] Ministry of Health and Family Welfare, \"Integrated Disease Surveillance Programme (IDSP): Operational Guidelines,\" MoHFW, New Delhi, India, 2022.
[3] International Institute for Population Sciences, \"National Family Health Survey (NFHS-5), 2019–21: India,\" IIPS, Mumbai, India, 2022.
[4] B. Amon et al., \"Real-time IoT-based water quality monitoring: A systematic review,\" Sensors, vol. 22, no. 12, p. 4405, 2022.
[5] A. Singh, R. Sharma, and P. Verma, \"Biosensor-based rapid detection of coliform bacteria in drinking water,\" J. Environ. Sci. Health A, vol. 58, no. 4, pp. 310–321, 2023.
[6] R. O. Afolabi et al., \"Digital health interventions in rural low-resource settings: A systematic review,\" MDPI Healthcare, vol. 7, no. 2, p. 56, 2019.
[7] S. Hossain, M. Islam, and F. Ahmed, \"Ensemble machine learning for infectious disease outbreak forecasting,\" PLOS ONE, vol. 17, no. 4, 2022.
[8] P. Ray and A. Bhatnagar, \"Time-series forecasting of diarrhoeal disease incidence in India using Facebook Prophet,\" Indian J. Public Health, vol. 66, no. 3, pp. 251–257, 2022.